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Abstract. In this paper we introduce an information theoretic 
approach and use techniques from the theory of Huffman codes 
to construct a sequence of binary sampling vectors to determine 
a sparse signal. Unlike the standard approaches, ours is adaptive 
in the sense that each sampling vector depends on the previous 
sample results. We prove that the expected total cost (number 
of measurements and reconstruction combined) we need for an s- 
sparse vector in 1R™ is no more than s log n + 2s. 



Let i be a vector in IR n and assume that the vector x has at most 
s << n nonzero components, denoted ||x||o < s. In compressed sam- 
pling, the goal is to determine a set of linear functionals (the sampling 
functions) and associated reconstruction algorithms such that, if the set 
of functionals is applied to x, then the reconstruction algorithms will 
allow us to find x from the values of the functionals (measurements of 
x) in a computationally tractable way which is stable (i.e., that should 
produce a good approximation even when x is not s-sparse) and robust 
to noise (i.e. that should produce a good approximation to x when the 
measurements (sampling) are corrupted by additive noise). There is a 
trade off between the number of sampling vectors we need to acquire 
x and the computational cost of the reconstruction algorithm which 
determines x from the samples. 

Some of the earlier work uses an l\ minimization for finding x from 
a set of samples {yi = (ai,x), i = 1, . . . , m} where the a«s are vectors 
in R n , see e.g., |CR06l ICRT061 IDeV07l IDon06l IGN031 ITro04j . Letting 
y = (yi, . . . , y m Y and A be the mxn matrix whose rows are the vectors 
<2j, the l\ minimization approach finds the unknown s-sparse vector x 
(i.e., ||x||o < s) by solving the constrained minimization problem 

(1.1) min II^H^, Az = y. 
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The solution to the minimization problem produces x as its unique 
solution provided A is an appropriate sampling matrix, e.g., A satis- 
fying the Restricted Isometric Property (RIP). The current methods 
for constructing matrices satisfying the needed RIP are probabilistic 
and produce matrices with m = 0(s\og(n/2s)) rows (i.e., we need 
m = 0(s\og{n/2s)) sampling vectors). The deterministic construc- 
tion of a matrix A satisfying the needed RIP property for arbitrary 
n(m << n) is still open. Finding x via the i\ minimization problem 
involves linear programing with n variables and m constraints which 
can be computationally expensive. 

Matching pursuits (OMP, ROMP, CoSaMP. etc., see |NT081 INTO] 
and the reference therein) form another class of sampling and recon- 
struction algorithms for finding an s-sparse signal x from the measure- 
ments y = Ax. These algorithms are iterative in nature and view the 
column of A as a dictionary for the reconstruction of x. At every step 
of the reconstruction algorithms, a certain number of columns (col- 
umn indices) are chosen in order to minimize an equation of the form 
\\r n — Ax n \\2, where r n = y — Ax n -\ is a residual from the previous 
step. The most performant of these type of reconstruction algorithms 
is relatively fast, stable, and robust to noise. However, similar to the 
t\ minimization algorithms, it requires a sampling matrix A satisfying 
the needed RIP. As before only probabilistic methods are known, and 
they produce matrices with m = 0(slog(n/2s)) rows that satisfy the 
RIP with high probability. 

Other approaches for compressed sampling are combinatorial. A 
sampling matrix A is constructed using bipartite graphs, such as ex- 
pander graphs, and the reconstruction finds an unknown s-sparse vec- 
tor x using binary search methods, see e.g. {BGIKS081 [GKMS03, 
IDWB051 ISBB06bl ISBB06al IGSTV061 IGSTV071 IXH07j and the ref- 
erences therein. Typically, the matrix A has binary entries. There 
exist fast algorithms for finding the solution x from the measurements 
(typically sublinear). However, the construction of A is still difficult to 
produce. 

There are emerging new approaches for adaptive methods in com- 
pressed sampling. One approach uses a Bayesian method combined a 
gaussian model for the measurements and a Laplacian model for the 
sparsity [JXC08]. The sampling vectors are chosen by minimizing a 
differential entropy. Another approach uses is a type of binary search 
algorithm that uses back projection and block adaptive sampling to 
focus on the possible nonzero component (see |HCN09] the references 
therein) . 
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The new approach we present is information theoretic and uses tools 
from the theory of Huffman codes to develop a deterministic construc- 
tion of a sequence of binary sampling vectors a a, i.e., the entries of a a 
consist of or 1. Moreover, unlike standard approaches, the sampling 
procedure is adaptive. We assume that the signal x G M. n is an instance 
of a vector random variable X = (X 1; . . . ,X n Y and we construct the 
z-th row di of A using the sample y^i = (a^i,x). Our goal is to 
make the average number of samples needed to determine a signal x 
as small as possible, and to make the reconstruction of x from those 
samples as fast as possible. We take advantage of the probability dis- 
tribution of the random vector X to minimize the average number of 
samples needed to uniquely determine the unknown signal x. In our 
method, rather than constructing a fixed set of sampling vectors form- 
ing the rows of a single sampling matrix A for all possible signals, we 
construct s sequences of sampling vectors. Each sequence focuses on 
finding exactly one nonzero component of the signal x. 

It is remarkable that the expected total cost of the combined sam- 
pling and reconstruction algorithms of an s-sparse vector x is no more 
than slogn + 2s. If no information is available about the probability 
distribution of the random vector X, we can always assume that we 
have a uniform distribution, in which case the total cost of the com- 
bined sampling and reconstruction algorithms of an s-sparse vector x 
is equal to s log n + 2s even if the uniform assumption is erroneous. 

This paper is organized as follows: In Section [2] we introduce the 
basic notation and definitions for sparse random vectors and trees. 
The new notion of Huffman tree and Huffman sampling vectors are 



introduced in Section 3.1 and 3.2 In Sections 3.3 and 3.4, we describe 



an information theoretic method for finding s-sparse vectors in IR n . In 



Section |3.5| we describe a variation for finding s-sparse vectors from 
noisy measurements. Section [4] is devoted to examples, simulations, 
and testing of the algorithms on synthetic data. 

2. NOTATION AND PRELIMINARIES 

In this section we introduce the necessary notations and preliminaries 
needed in subsequent sections. 

2.1. Sparse random vectors. 

(1) We will use the notation X = (X\, . . . , X n ) 1 to denote a vector 
of n random variables. An instance x G W 1 of X will be called 
a signal. 

(2) We will say that a signal x6l™ has sparsity s < n if x has at 
most k nonzero components (||a;||o < s). 
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(3) Let Q = {1, . . . ,n} be the set of all indices, and A C fi be a 
subset of indices. Then Pa = P(A) will denote the probability 
of X having nonzero components exactly at coordinates indexed 
by A, i.e., P A = Pr{Xi ^ 0,i 6 A ; X { = 0,i G A c }. Here A c 
denotes the complement of A. 

(4) The pair (X, P) will be used to denote the random vector X 
together with the probability mass function P on the sample 
space 2 n = 2^'-' n \ Thus obviously Ea P a = 1. 

(5) We will say that a random vector X is s sparse if Pa = for all 
A with cardinality strictly larger than s, i.e., #(A) > s implies 



(6) We will need the probabilities q\ = ?(A) = Pr(E\) for the 
events E\ = {X{ ^ 0, for some % G A}, which is the probability 
that at least one of the components of X with index in A is 
nonzero. Note that q can be computed from P by 



2.2. Trees. 

(1) We consider finite full binary trees in which every node has zero 
or two children. 

(2) The root of the tree is the node with no parent nodes. 

(3) A leaf is a node with no children. 

(4) A left (right) subtree of a node v in a rooted binary tree is the 
tree whose root is the left (right) child of this node v. 

(5) The set of all nodes that can be reached from the root by a 
path of length L are said to be at level L. 

2.3. Other notations. 

(1) The notation xa will denote the characteristic function of a set 
A, i.e., xa(0 = 1 for i G A and XaW — for i A. 

(2) For a set A, |A| will denote its cardinality. 



In this section we describe our approach explicitly. 

3.1. Huffman tree. Let (X, P) be a s-sparse random vector X in R n 
together with the probability mass function P on the sample space 2^ = 
2i 1 '---M the set of all subsets of f2 = {1, . . . ,n}. We define a Huffman 
tree to be a binary tree whose leaves are the sets {1}, . . . , {n}. We 
associate probabilities • • • , ?{ n } to these nodes respectively. The 
Huffman tree is constructed from the leaves to the root as follows: 



Pv = 0. 



(2.1) 




3. Theory 
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Figure 1 . Huffman Tree for the 2-sparse random vector 

let 4 

Suppose that the nodes at the i-ih step are A 1; . . . , A s . Let i and j be 
such that 

l<A<s 

g A = min q Ax . 

l<\<s,\^i 

Then, the nodes at the (i + l)-th step are obtained by replacing A« 
and Aj with Aj U Aj in the list of the nodes at the i-th step and the 
probability associated to the node Aj U Aj, will be t^uA-, i-e., at each 
step the two nodes with smallest probabilities are combined. An illus- 
trative example is shown below. 

Example 3.1. Assume that X e 1R 4 is a 2-sparse random vector with 
probability mass function P defined by: P® = 0.02, Pin = 0.07, P{2} = 
0.05, P {3} = 0.03, P {4} = 0.1,P {1)2} = 0.31,P {1 , 3} = 0.2,P {M} = 0.03,P {2 , 3} 
0.06,P{ 2 ,4} = 0.12, P{3,4} = 0.01. The nodes at the first step are: 



{1}, . . . , {4}. Using equation 2.1, we get that 



9(1} = P{1} + ^{1,2} + P{1,3} + P{1,4} = 0.61. 

Similarly, q^ 2 } = 0.54, g{ 3 } = 0.3, and q^ = 0.26. Therefore the nodes 
at the second step are {1}, {2}, {3,4}. Also ^{3,4} = 0.55 and hence the 
nodes at the third step are {1},{2,3,4}. The root note is {1,2,3,4} 
with probability ^{1,2,3,4} = 1- This completes the Huffman tree (See 
Figure 1). 

3.2. Huffman sampling vectors. In this section we introduce Huff- 
man sampling vectors. Let (X, P) be as in the previous section. Our 
goal is to find a signal x which is an instance of X using, on average, 
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a minimum number of samples. The average number of samples is 
measured by 

(3.1) L(X,P) = ^P A 4, 

A 

where £ A is the number of samples for finding one nonzero component 
of x whose components Xi 7^ for i G A and X{ = for i G A c . 

As mentioned before, we first focus on finding one nonzero compo- 
nent, if it exists. Other nonzero components are then iteratively found 
one at a time in a similar manner until all of the nonzero components 
are exhausted. 

We first construct a Huffman tree associated with the random vector 
X. Each node A which is not a leaf has two children Ai and A2. Note 
that Ai n A 2 = and Ai U A 2 = A. We denote £ Al = q Al (log |A X | + 1) + 
(l-g Al )(log|A 2 | + l) and £ M = g Aa (log | A 2 1 + 1) + (1 - g Aa ) (log | A x | + 1) . 
We associate a sampling vector to such nodes A by: 



(3.2) a A 



XAi if I A! < ^A 2 , 

Xa 2 if hi > h 2 , 



i.e., for Zai < Za 2 , we have a A {i) = 1 for i G Ai and a A (i) = for 
i G ft — A 1; and for l Al > l A , 2 we have a A (i) = 1 for i G A 2 and 
a A(i) — for i G Q — A 2 . 



The choice of the sampling vector in (3.2) can be seen as follows: 
Since our goal is to find the nonzero component as quickly as possible, 
it seems that we would need a A = XAi for the set A, with the highest 
probability q Ai , i — 1,2. However, the set Aj with the highest proba- 
bility q Ai may also have a large number of elements. Thus the choice 
should be a compromise between the size of the set and its probability. 
This particular choice of a A will be apparent from the theorems and 
their proofs below. 

3.3. Determination of a sparse vector x using Huffman sam- 
pling vectors. Let x be an s-sparse signal in IR n which is an instance of 
(X, P) (we will write x ~ (X, P)). We make the additional assumption 
that the conditional probability Pr(^2 ieA Xi 7^ 0|Xj 7^ 0,i G A) = 
holds for any A C Q (recall that Q = {1. . . . ,n}). This is a natural 
condition if the random variables Xi, i = 1, . . . , n, in the random vector 
X do not have a positive mass concentration except possibly at zero. 

3.3.1. Finding a nonzero component. Algorithm [T] below is used to find 
the position and the corresponding value of one of the nonzero compo- 
nents of x (if any). 
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Algorithm 1. 

(1) Initialization: A = Q; 

(2) Repeat until |A| = 1 

if (a A ,x) ^ 0, A = Ai 
else A = A 2 
end repeat 

(3) Output the (only) element t\ G A 

(4) Output x h = (X{h},x) 

Remark 1. 

(i) If the vector x = 0, then the algorithm will find an output x tl = 
0, otherwise it will output the value Xt x of one of the nonzero 
components of x and its index t\. 

(ii) Note that the last inner product output in (4) is not always nec- 
essary since we have all the information needed to find x tl from 
the samples and the value of t\ . However, this would involve 
solving a linear system of equations obtained from the sampling 
scheme. Thus this one extra sample can be considered as recon- 
struction step. 

(iii) Note that the sampling vectors depend on the instance x, i.e., 
the sampling vectors are adaptive. 

(iv) The number of possible sampling vectors is equal to the number 
of nodes in the Huffman tree, but only a subset of these vectors 
is used to determine a nonzero component of a given instance 
vector x. 

(v) // P(X = 0) > 0, then we choose the first sampling vector 
a = (1,1,..., 1). // (a, x) = we are done. Otherwise we 
proceed with Algorithm^ 

The first observation is that Algorithm 1 is optimal for 1-sparse 
vectors. This should not be a surprise since the algorithm was inspired 
by the theory of Huffman codes. We have 

Theorem 3.2. Given a 1-sparse vector x ~ (X, P) in M. n . Then the 
average number of samples Li(X, P) needed to find x using Algorithm 
[7] is less than or equal to the average number of samples La(X,P) for 
finding x using any algorithm A with binary sampling vectors. 

Proof. Let E = {ej}" =1 be the canonical basis for IR n . We first note that 
for any sampling algorithm with binary sampling vectors, the number of 
samples required to determine any nonzero multiple ae« of (a G M) is 
equal to the number required to determine e^. Hence for the remainder 
of this proof we will assume that x is binary, i.e., x is one of the 
canonical vectors ej. Since any sampling vector a from an algorithm 
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A is binary, the inner product (a, a;) is either zero or one, i.e., binary. 
For each ej, i 6fi = {l,...,n}, a binary algorithm uses a sequence of 
vectors {a\, . . . ,a\.} where ii is the number of sampling vectors needed 
to determine e^. We associate to each e« the binary sequence d = 
y\y l 2 . . . y\ where y] = (a*-,ej). In this way each canonical vector 
is associated to a unique sequence c\ c l ^ c J if i ^ j. Hence d is a 
binary code for the vectors of the canonical basis of M. n . The code is a 
prefix (also known as instantaneous code, i.e., no code is a prefix of any 
other code |CT9f j ) since a binary algorithm terminates after finding 
the nonzero component (which correspond to the shorter code). Since x 
is 1 sparse, we have that $ = Pi which is the probability of component 
% being nonzero and all other components being zero. Hence for each 
algorithm A with binary sampling vectors, we associate a prefix code for 
E = {ei}2=i- Consequently, the average number of samplings required 
in this algorithm A is the same as the average length of the code: 

n 

La(X,P) = ^2£iPi. From the construction, Algorithm 1 is associated 

i=l 

with the Huffman code whose average length is the shortest. Hence 
the average number of samples Lx(X, P) < La(X, P) for any A. □ 

The 1-sparse case is a special case. It is optimal because the sampling 
scheme can be associated exactly with the Huffman codes and we have 
that q\ = P\ and Xli<?W = A*} = 1- m i^-ct, the choice of a\ 



in (3.2) can be chosen to be either xa 1 or Xa 2 independently of the 
values of £a 1 ,^a 2 - However, for the general s-sparse case, we do not 
have qA = Pa anymore, and the sampling vectors cannot be associated 
with Huffman codes directly. Thus for the s-sparse case, the average 
number of sampling vectors is not necessarily optimal and we need to 
estimate this number to have confidence in the algorithm. 



From the construction of the Huffman tree in Section 3.1, it is not 
difficult to see that there is at most one node A (called special node) 
with children Ai and A 2 such that (| — 9Ai)(| — QA 2 ) < 0> i- e -> except for 
possibly the special node, all other nodes have the property that , q\ 2 
are either both larger than 1/2 or both smaller than 1/2. Thus, a 
Huffman tree can at most have one special node. We have the following 
lemmas: 

Lemma 3.3. For any fixed node A with children Ai and A 2; if A is 
not a special node, then min{^Ai5^A 2 } — l°g \M- 

Proof. Without loss of generality, we assume that |Ai| < |A 2 |. Hence 
1M < I and JM > i 
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Consider the function f q (x) = x q {l — x) 1 9 for x G [0, 1] and q G [0, 1]. 
Easy computations show that 

111 

(3.3) fg(x) <-, for q < - and x > -, 

and 

(3.4) f q (x) < i for q > ]- and x < ^ 

Since A is not a special node, we have that (q^ — \){q\ 2 — \) > 0. 

If 9Ai < | and gA 2 < §> then using the fact that j^r > \ and (3.3), 
we get / 9A2 (^) < §, that is 



iM^A.n l A2 l M-c- - 1 



v 2 (l - L-^i) 1 -^ < 



l A l v W -2' 
which implies that 

2|A 2 | 9A HAir^ A2 < |A|, 
after taking the log function on both sides, we have 

£ A2 <log|A|. 

A similar calculation for the case q^ x > | and q\ 2 > \ yields 

^ <log|A|. 

In either case, we have that min^A!,^,} < log |A|. □ 

For general node, we have the following, 

Lemma 3.4. For any fixed node A with children Ai and A 2; we have 
that min{£ Al ,^A 2 } < log |A| + 1. 



x q {l — x) 1 q < 1. We use this inequality for x = ^ and q = g Al or 
x = jjrr and q = q^ 2 to obtain the result. □ 



Proof. For any x G [0, 1] and any q G [0, 1] we have that f q (x) 

Ail 

|A| 

onn n n . ± r\ nkloin 4" It o tvu'ii 1 1" 

|A| 

Lemma 3.5. Given a nonzero s-sparse vector x ~ (X,P) in lR n . If 
the Huffman tree associated with x has no special node, then the av- 
erage number of samples L needed to find the position of one nonzero 
component of x using Algorithm^ is at most logn. 

Proof. We will use induction on n to prove this lemma. Suppose A = 
{1, . . . , n} and A has children Ai and A 2 . 

For n = 2, |A| = 2, we only need one vector XAi ° r Xa 2 to determine 
the position of one nonzero component of x. Hence the lemma holds 
trivially for this case. 
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Now assume the lemma is true for n = k,k — 1, . . . ,2, we want to 
show it is also true for n = k + 1. 

If |A| = k + 1, we must have |Ai| < k and IA2I < k. Without loss 
of generality, suppose £ Al < £\ 2 . By Algorithm [TJ a a = XAi ■ Then 
with probability g Al , we have (a A ,x) 7^ 0, in which case we need (on 
average) another L Al sampling vectors. With probability 1 — g Al ? we 
have (a\,x) = 0, and we need (on average) another L\ 2 sampling 
vectors. By the induction hypothesis, we have L Al < log | A x | and 
La 2 < log |A 2 |. 



Since by assumption the tree has no special node,using Lemma |3.3 
we deduce that the average number of sampling vectors we need is 

L = q Al (l + L Al ) + (1 - ?Al )(l + L Aa ) 

< g Al (l + log |Ax|) + (1 - g Al )(l + log |A 2 |) 
= £ Al < log |A| =\ogk. 

□ 

We are now ready to find an upper bound on the average number of 
sampling vectors needed for finding the position of one nonzero com- 
ponent in an s-sparse signal using Algorithm [l] Denoting by T\ the 
subtree with A as the root, we have 

Theorem 3.6. Given a nonzero s-sparse vector x ~ (X, P) in M. n the 
average number of samples L needed to find the position of one nonzero 
component of x using Algorithm^ is at most logra + 1. 

Proof. We will use induction. The lemma holds trivially for n — 2. 

Now assume the lemma is true for n — k, k — 1, . . . , 2, we want to 
show it is also true for n — k + 1. 

If |A| = k + 1, we must have |Ai| < k and |A 2 | < k. Without loss of 
generality, suppose £ Al < £\ 2 . By Algorithm [TJ the average number L 
of sampling vectors needed is 

(3.5) L = g Al (l + L Al ) + (1 - g Al )(l + L A J. 

Since the Huffman tree can have at most one special node, we consider 
three cases: 

Case(l): If the root of the tree A = {1, . . . , n} is a special node 
and Ai, A 2 are its children. Then the subtrees T Al and T\ 2 have no 



special nodes. Thus by Lemma 3.5, we have that L Al < log | Ai | and 
L A2 < log |A 2 |. 
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From Lemma 3.4, £ Al < log |A| + 1. Thus we have that 
L = g Al (l + L Al ) + (1 - g Al )(l + L Aa ) 

< g Al (l + log lAil) + (1 - g Al )(l + log |A 2 |) 
= £ Al < log |A| + 1 = log(fc + 1) + 1. 

Case (2): If A = {1, . . . , n} is not a special node and the subtree T Al 



has no special node, from Lemma 3.5, we have L Al < log | Ai | . Since 



| A-2 1 < k, from the induction hypothesis, we have L A2 < log |A 2 | + 1. 



From Lemma 3.3, £ Al < log |A|. Thus we have that 

L = q Al (l + L Al ) + (1 - g Al )(l + L A J 

< g Al (l + log (Ail) + (1 - g Al )(l + log |A 2 | + 1) 
= £ Al + l- q Al < log |A| + 1 = log(A; + 1) + 1. 

Case(3): If A = {1, . . . , n} is not a special node and the subtree T A2 
has no special node, then the same computation as in Case (2) gives 
L < \og{k + 1) + 1. □ 

3.3.2. Iterative step for finding another nonzero component. For every 
subset uj d Vt with \uj\ < s we let P% = P W (A) denote the conditional 
probabilities 

Pr{X % ^ 0,i e A andXi = 0,i e Q - {AUu}\Xi 0, i G u], 
for any A C f2 — uj. 



Similar to Section 3.1, we let g A = g^(A) = Pr(E A ) for the events 
E A = {Xi ^ 0, for some i G A|JQ ^ 0, i G uj}, which is the conditional 
probability that at least one of the components of X with index in A C 
Q — uj is nonzero given that X± ^ for i G uj. Using the same procedure 



as in Section |3X build a Huffman tree with leaves {z}, % G fl — uj with 



probabilities g A . Note that this tree has n — \uj\ leaves. 



As above (see (3.2)), we assign a sampling vector G M n to every 



node A which is not a leaf. Note that from the construction of a A we 
have that a A (i) = for i G uj. 

Let k < s be the number of nonzero components of x that are found 
and let uj = {ti, . . . be the set of corresponding indices. The al- 
gorithm for finding the (k + l)-th nonzero component of x (if any) is 
essentially the same as in Algorithm [TJ 

3.4. Algorithm for finding all nonzero components of x. The 

general algorithm for finding the s-sparse vector x ~ (X, P) which is 
an instance of the s-sparse random vector (X, P), can now be described 
as follows: 
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Algorithm 2. 

Initialization: s=l; uo = 0; 
Repeat until k > s or x% h = 

(1) A = O - u; 

(2) repeat until |A| = 1 

if(a i t w ,x)^0, A = A i; 
else A = A 2 ; 
end repeat 

(3) Output the (only) element t}~ G A; 

(4) Output x tk = (x{t k },x); 

(5) u = to U {t k }; 

(6) k = k + 1; 
end repeat 

Algorithm [2] repeats Algorithm [I] at most s times and adds one extra 
sample to determine the value of each nonzero component once its 



position is known. Thus, as a corollary of Theorem 3.6 we immediately 

get 

Corollary 3.7. Given a nonzero s-sparse vector x ~ (X,P) in R n , 
the average number of sampling vectors L needed to find all nonzero 
components of x using Algorithm^ is at most slogn + 2s. 

Remark 2. 



(i) Corollary 3.7 states that the upper bound on the expected to- 
tal cost (number of measurements and reconstruction combined) 
that we need for an s-sparse vector in M. n using Algorithm^ is 
no more than s logn + 2s. 

(ii) If the probability distribution P is uniform then the combined 
cost of the measurements and reconstruction is exactly s log n + 
2s. 

3.5. Noisy measurements. In practice, the measurements {yi} maybe 
corrupted by noise. Typically the noise is modeled as additive and 
uncorrelated : y\ = (x,ajC) + (see [CW08J). For this case the con- 



dition Az — y in (1.1) for the l\ minimization technique is modified 
to \\Az — y\\ 2 < e where e is of the same order as the standard de- 
viation a v of the noise. With this modification, the l\ minimization 
technique yields a minimizer x* satisfying — x\\2 < Ce where C is 
a constant independent of x. Similar modifications are made for the 
other techniques, e.g., £ q minimization (see |FL09] ) . 

Similarly, our algorithm needs to be modified accordingly to deal 
with noisy measurements case. Algorithm [2] can be modified by chang- 
ing the statement (a^~ aJ ,x) ^ to the statement |(a^ _aJ ,x)| > T, 
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where the threshold T is of the same order as the standard deviation 
of Tj. 

Consider the model Y = X + rj where the signal X ~ N(0, ax) 



and the noise rj ~ N(0, a rj ). Then Y ~ N(0, a\ + af). We set the 

threshold in Algorithm Jij to be T = -E(|7/|) = ^=j=, and consider a 
measure of error (for one sample) given by the probability 

p(e) = P(\Y\ < T and |X| > T|) + P(|Y| > T and |X| < T|) 

After easy computation, we have that 

p( e ) = erf(-^-) + erf(-^-) - erf(-^-)erf(- ^ N 



\pK(T X V^CTy V^CTx y/^CTy 

where erf(x) = ^ Jj^ e~ t2 dt is the error function. 
Using the Taylor series, we obtain 

P( e ) = ^-^ + °(^ 

where t = — is the ratio of the standard deviations of the noise and 
the signal. Thus, for a relatively large signal to noise ratio, t will 
be small and we get that the probability of at least one error in the 
sampling-reconstruction for an s sparse vector is bounded above by the 
quantity 

pJe) = l-(l-p(e)) s(logn+1) = s(logn+l)p(e)+o(p(e) 2 ) w s(logn+l) — . 

7T 

It can be seen that p s (e) is essentially linear in the sparsity s, linear 
in t and logarithmic in the dimension n, as can also be seen in the 
simulations below. 

3.6. Stability and compressible signals. One of the advantage of 
the standard compressed sensing methods is that they produce almost 
optimal results for signal that are not s-sparse. For example, if we let 
/3l(x) denote the smallest possible error (in the £ l norm) that can be 
achieved by approximating a signal x G IR n by an s-sparse vector z: 

f3 s {x) := inf{||x - z\\ u \\z\\ < s}, 



then the vector x* solution to the t\ reconstruction method (1.1) is 
quasi-optimal in the sense that ||x — < C(3 s {x) for some constant 
C independent of x. Since for a given x G M n , the quantity Pl(x) is 
the l\ norm of the smallest n — s components of x, the previous result 
means that if x is not s-sparse, then x* is close to the s-sparse vector 
x s whose components are the s-largest components of x. In particular, 
if x is sparse, then x* = x. 
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Clearly, our current approach cannot produce similar results since, 
in its current form, our method does not have any incentive for finding 
the largest non-zero components. However x can be decomposed into 
x = x s + (x — x s ), and a measurement y = (a\, x s ) + r/\ where r/\ = 
(<2a, , {x — x s )) can be viewed as noise. Thus, a possible modification 
of the method is to replace (a^ _a; ,x) 7^ by \(a^l~ w ,x)\ > A*.(A) in 
Algorithm [2] where Afc(A) depends on the characteristics of the random 
variables X iy and A. However, such modification will not be studied in 
this paper. 

4. Examples and Simulations 

In this section, we provide some examples and test our algorithm 
on synthetic data. In the first experiment we use an exponential dis- 
tribution to generate the position of the nonzero components of the 
s-sparse vectors, and uniform distribution for their values. The signal 
x is generated by first generating an integer index i e [l,n] using the ex- 
ponential pdf with a mean of 10, and then constructing the component 
x(i) = A(rand — 0.5), where rand is a random variable with uniform 
distribution in [0,1]. In all the other experiments, we use a uniform 
probability distribution for both signal and noise. The signal x is gen- 
erated by first generating an integer index i £ [l,n] with a uniform dis- 
tribution, and then constructing the component x(i) = A(rand — 0.5), 
where rand is a random variable with uniform distribution in [0,1]. 
This process is repeated s times for s sparse signals. Each additive 
noise r/\, is generated by i]\ = N * (rand — 0.5) and added to the 
measurement y\. All the experiments are done using Matlab 7.4 on a 
Macintosh MacBook Pro 2.16 GHz Intel Core Duo processor 1GB 667 
MHz RAM. 

4.1. Noiseless cases. 

Simulation 4.1. Our first experiment is a sparse vector x ~ (X, P) 
in a space of dimension n = 2 15 with an exponential pdf with mean 10 
for the location of the nonzero components and a uniform distribution 
for the values of the components as described above. We have tested 
our algorithm with s = 1,3,5,7,9,11,13. The mean and variance of 
the number of sampling vectors needed for the various sparsity s (for 
the combined sampling and reconstruction) is shown is Table^ 

Simulation 4.2. Our second example is a sparse random vector X in 
a space of dimension n = 1024 with a uniform probability distribution. 
We have tested our algorithm on an example with n = 1024 ; s = 
1,25,50,75,100,125,150. The time for finding the vector x for the 
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s 


1 


3 


5 


7 


9 


11 


13 


s \ogn 


15 


45 


75 


105 


135 


165 


195 


Mean 


9.11 


27.5 


46.17 


61. 47 


81.26 


97.2 


111.88 


Var 


10.15 


16.6 


25.5 


25.88 


31.88 


30.94 


37.37 



TABLE 1 . Mean and variance of number of sampling vec- 
tors as functions of sparsity s for n = 2 15 = 32768. 




Sparsity K 

Figure 2. Relative £ 2 error of reconstruction from noisy 
measurements. 



various sparsity s is shown in Table [1| It is clear from the table that 
the methods is very performant. 



s 


1 


25 


50 


75 


100 


125 


150 


CPU time 


0.0045 


0.028 


0.049 


0.073 


0.098 


0.144 


0.146 



Table 2. CPU time as a function of sparsity s for n = 1024. 



4.2. Noisy measurements. 

Simulation 4.3. In this test we fix the following values: n = 512, 
A = 20, N = 0.1. For each value of s, we construct 100 s-sparse 
signals in W 1 . We test the effect of s on the £ 2 relative error 
(in percent) as a function of the sparsity s. The results are displayed 
in Figure 2. The experiments shows that the relative error increases 
linearly with s. 

Simulation 4.4. In this test we fix the following values: n = 512, 
A = 20, s = 16. We test the effect of noise on the £ 2 relative error 
ipy (in percent) as a function of the value N of the noise. The 
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Figure 3. Relative £ 2 error of reconstruction from noisy 
measurements. 
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FIGURE 4. Relative £ 2 error of reconstruction from noisy 
measurements. 

results are displayed in Figure 3. The experiments suggest that the the 
relative error increases linearly with N. 

Simulation 4.5. In this test we fix the following values: A = 20, 
N = 0.1, s = 8. We test the effect of n = 2 r on the £ 2 relative 
error ^~^ 2 (in percent) as a function of the value r. The results are 
displayed in Figure 4- 

5. Conclusion 

We have presented an information theoretic approach to compressed 
sampling of sparse signals. Using ideas similar to those in Huffman cod- 
ing, we constructed an adaptive sampling scheme for sparse signals. In 
our scheme, the sampling vectors are binary and their construction are 
deterministic and can be produced explicitly for each n. Without noise 
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the reconstruction is exact, and the average cost for sampling and re- 
construction combined for an s sparse vector is bounded by slog(n)+2s. 
We have also shown that the method is also stable in noisy measure- 
ments. However, the current method and algorithms are not adapted 
to the compressive signals and developments for these cases will be in- 
vestigated in future research. We hope that the approach will stimulate 
further developments and interactions between the area of information 
theory and compressed sampling. 
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